Marine spatial data infrastructures – Approaches on evaluation, design and implementation –
Dissertation to obtain the degree of doctorate in engineering (Dr.-Ing.) at the Faculty of Agricultural and Environmental Sciences of the University of Rostock
submitted by: MSc Christian Ruh¨ born on May 17, 1985 in Pasewalk [email protected]
Advisors: Prof. Dr.-Ing. Ralf Bill University of Rostock Dr.-Ing. Rainer Lehfeldt Federal Waterways Engineering and Research Institute (BAW), Hamburg Prof. Dr. Christophe Claramunt Naval Academy Research Institute, France
Date of submission: February 21, 2014 Date of viva: June 2, 2014
Rostock, June 30, 2014
Acknowledgements
This thesis would not have been possible without all the people who helped and supported me throughout it. Special thanks go to my supervisor Prof. Dr.-Ing. Ralf Bill. His extensive experience and input were invaluable in the realization of this thesis. I am also very thankful to my examiners Dr.-Ing. Rainer Lehfeldt from the German Federal Waterways Engineering and Research Institute and Prof. Dr. Christophe Claramunt from the French Naval Academy Research Institute for their support, continued assistance and guidance. I would also like to thank the past and present members of the MDI-DE project, the sub project directors Johannes Melles (German Federal Maritime and Hydrographic Agency, BSH) and Peter Hubner¨ (German Federal Agency for Nature Conservation, BfN) are mentioned representatively at this point. Furthermore I would like to acknowledge the generous assistance of my colleagues at the Professorship for Geodesy and Geoinformatics at the University of Rostock. In particular I would like to thank Dr.-Ing. Peter Korduan. His depth of knowledge and devotion to technical research created a positive working atmosphere for me. Dr.- Ing. Gorres¨ Grenzdorffer¨ was also helpful with his useful hints and advices. Many thanks also go to the unknown reviewers of my papers for their advice and hints. My appreciation and thanks also goes to my fiancee´ Sarah Seip for her continuous support and love. Without her this thesis would have not been possible. In conjunction with this I would also like to thank Prof. Dr. med. H. W. S. Schroeder and his colleagues from the Greifswald University Hospital.
Abstract
Humanity is reliant on and attracted to the marine environment. In order to make use of its resources and to protect it guidelines and directives were and are being developed. To aid the implementation of such efforts spatial data infrastructures (SDIs) can be used. These allow administrative officers and scientists inter alia to publish data and prepare reports. They can also be used by the public or contribute to political decision-making processes. In the marine domain such SDIs are called marine spatial data infrastructures (MSDIs) and Germany began developing one – called MDI-DE – in 2010. Other countries developed MSDIs well before the year 2010 which opens up the opportunity to learn from these approaches. In order to have a rather objective and comparable base an evaluation framework is needed. This implies equal procedures for each MSDI which means that one cannot lose track of things. Furthermore this indicates that the results of the evaluations elaborate the pros and cons (potential pitfalls and things done well). This thesis develops such an evaluation framework to assess MSDIs and applies it to the MSDIs of Ireland, the UK, the USA, Canada and Australia. Another opportunity that opened up because Germany is building a MSDI for the first time is that its development can be based on and guided by a reference model. A reference model structures large and complex distributed systems such as MSDIs with the help of several viewpoints respectively submodels. These allow focusing on specific parts of an architecture and are necessary because different stakeholders have different interests in such a system. The reference model this thesis proposes consists of five such submodels: business, role, process, architecture and implementation. The reference model inter alia envisaged setting up infrastructure nodes with dis- tributed services. Services are a base of a SDI to work. They are also required by the INSPIRE directive. INSPIRE proposes requirements regarding performance so that services are conveniently accessible. Furthermore INSPIRE requires data and metadata to follow a specific structure. The same is true for services themselves because they have to follow given standards and specifications e.g. by the International Organization for Standardization (ISO) and the Open Geospatial Consortium (OGC). This thesis uses existing tools to monitor and evaluate services and attempts to clarify whether the results of the tools are comparable and if the INSPIRE requirements can be evaluated with such tools. Lastly, an important aspect of MSDIs, in particular, are terms which are combined in so-called controlled vocabularies respectively thesauri because MSDIs are more scien- tifically oriented and interdisciplinary than terrestrial SDIs. The existing vocabularies did not allow to be used by systems (e.g. for metadata annotation) or be maintained by marine experts (e.g. by using a web authoring tool). To allow such usages this thesis implements a tool to convert the vocabularies into the Simple Knowledge Organisation System (SKOS) format. The conversion into SKOS allows importing the vocabularies into an online thesaurus management tool. Altogether this thesis with focus on specific aspects of evaluation, design and im- plementation of marine spatial data infrastructures should scientifically support the development of the German approach for the MDI-DE. Keywords: Spatial data infrastructure, marine, INSPIRE, reference model, services, modelling Zusammenfassung
Die Menschheit ist sowohl von der Meeresumwelt angezogen, als auch auf diese angewiesen. Um sie zu schutzen¨ und ihre Ressourcen zu nutzen, wurden und werden Richtlinien entwickelt, deren Anforderungen u. a. mit Geodateninfrastrukturen (GDIen) erfullt¨ werden konnen.¨ Diese ermoglichen¨ u. a. verantwortlichen Sachbearbeitern oder Wissenschaftlern Daten zu veroffentlichen¨ und Berichte zu erstellen; konnen¨ aber auch von der Offentlichkeit¨ verwandt werden oder zu politischen Entscheidungsprozessen beitragen. Im marinen Umfeld heißen solche GDIen marine Geodateninfrastrukturen (MGDIen) und Deutschland entwickelt eine solche – genannt MDI-DE – seit 2010. Andere Lander¨ entwickelten MGDIen bereits weit vor dem Jahr 2010, was die Moglichkeit¨ eroffnete,¨ von diesen Ansatzen¨ zu lernen. Um eine relativ objektive Basis zu haben, benotigt¨ man einen Bewertungsrahmen. Dieser ermoglicht¨ es, bei der Analyse der MGDIen stets gleich vorzugehen und somit im Ergebnis der Bewertungen die Vor- und Nachteile der existierenden Ansatze¨ herausarbeiten. Diese Arbeit konzipiert einen solchen Bewertungsrahmen fur¨ MGDIen und wendet diesen auf die MGDIen von Irland, Großbritannien, USA, Kanada und Australien an. Eine weitere Chance, die sich dadurch ergibt, dass Deutschland zum ersten Mal eine MGDI aufbaut, ist die Moglichkeit,¨ diese auf Grundlage eines Referenzmodells zu entwickeln. Ein Referenzmodell erlaubt die Strukturierung großer und komplexer verteilter Systeme, wie z. B. MGDIen, mithilfe mehrerer Teilmodelle. Diese ermoglichen¨ u. a. die Konzentration auf bestimmte Teile einer Architektur. Das Referenzmodell, das in dieser Arbeit aufgebaut wird, gliedert sich in die Teilmodelle Geschafts-,¨ Rollen-, Prozess-, Architektur- und Implementierungsmodell. Das Referenzmodell sieht u. a. die Einrichtung von Infrastrukturknoten mit Dien- sten vor. Dienste werden auch von der INSPIRE-Richtlinie gefordert, die uberdies¨ Anforderungen in Bezug auf die Leistungsfahigkeit¨ von Diensten definiert. Daruber¨ hinaus mussen¨ Dienste vorgegebenen Standards und Spezifikationen der International Organization for Standardization und des Open Geospatial Consortiums entsprechen. Diese Arbeit stutzt¨ sich auf bestehende Werkzeuge zur Uberwachung¨ und Bewertung von Diensten und untersucht die Vergleichbarkeit der Ergebnisse der Werkzeuge und ob die Anforderungen von INSPIRE mit solchen Werkzeugen bewertet werden konnen.¨ Abschließend sind Begriffe, die in sogenannten kontrollierten Vokabularen beziehungs- weise Thesauri zusammengefasst werden, ein wichtiger Aspekt insbesondere von MG- DIen, da MGDIen in hoherem¨ Maße wissenschaftlich orientiert und fachubergreifender¨ sind als terrestrische GDIen. Mit den vorhandenen Vokabularen war es nicht moglich,¨ sie von Systemen (z. B. fur¨ die Beschreibung von Metadaten) verwenden oder sie von Wis- senschaftlern gemeinschaftlich pflegen zu lassen. Um solche Nutzungen zu ermoglichen,¨ wird in dieser Arbeit ein Werkzeug entwickelt, das Vokabulare in das Simple Knowledge Organisation System (SKOS) Format konvertiert, was den Import der Thesauri in ein Web-Thesaurus-Management-Tool erlaubt. Insgesamt soll diese Arbeit uber¨ bestimmte Aspekte der Bewertung, des Entwurfs und der Umsetzung von Marinen Dateninfrastrukturen die Entwicklung des deutschen Ansatzes fur¨ die MDI-DE wissenschaftlich unterstutzen.¨ Schlagw¨orter: Geodateninfrastruktur, marin, INSPIRE, Referenzmodell, Dienste, Modellierung
vi Contents
1 Introduction 1 1.1 Motivation ...... 1 1.2 Objectives ...... 2 1.3 Outlook on the thesis ...... 4
2 Fundamentals, basic concepts and standards 7 2.1 Spatial Data Infrastructures ...... 8 2.1.1 Interoperability ...... 9 2.1.2 SDI definition ...... 10 2.1.3 Components of an SDI ...... 11 2.1.4 Classification of SDIs ...... 12 2.1.5 Marine SDIs ...... 14 2.2 Geospatial standards ...... 19 2.2.1 ISO TC 211 and its 191XX series ...... 19 2.2.2 Metadata standards ...... 21 2.2.3 OGC specifications ...... 22 2.3 Standards for reference models ...... 25 2.3.1 RM-ODP ...... 26 2.3.2 The “4+1” View Model of Software Architecture ...... 28 2.3.3 Use of UML in reference models ...... 29 2.4 Standards for knowledge representation ...... 31 2.4.1 Fundamentals – XML and DOM ...... 32 2.4.2 Ontologies ...... 33 2.5 Directives in the marine and SDI domain ...... 43 2.5.1 INSPIRE ...... 44 2.5.2 Water Framework Directive (WFD) ...... 47 2.5.3 Marine Strategy Framework Directive (MSFD) ...... 47 2.6 Conclusions ...... 48
3 Existing approaches and established systems 49 3.1 Germany: MDI-DE ...... 49 3.2 International MSDIs ...... 53 3.2.1 Australia ...... 53 3.2.2 Canada ...... 54 3.2.3 Ireland ...... 55 3.2.4 United Kingdom ...... 55 3.2.5 United States of America ...... 56
vii Contents
3.3 Reference models for SDIs ...... 57 3.3.1 Selected reference models in Germany ...... 57 3.3.2 WRON Reference Model (WRON-RM) ...... 61 3.3.3 Digital Earth Reference Model (DERM) ...... 63 3.3.4 Conclusions ...... 65 3.4 Existing marine vocabularies ...... 66 3.4.1 Kuste¨ ...... 66 3.4.2 NOKIS ...... 66 3.4.3 LANIS Habitat Mare (LHM) ...... 67 3.5 SKOS Tools ...... 68 3.5.1 Conversion Tools ...... 68 3.5.2 Web based Thesaurus Management Tools ...... 71 3.6 Tools to evaluate performance and conformity of services . . . . 75 3.6.1 Quality of Service ...... 76 3.6.2 Tools concerning conformity ...... 77 3.6.3 Tools testing performance and availability ...... 80 3.6.4 GDI-DE Testsuite ...... 85
4 Evaluation of existing MSDIs 89 4.1 Building an evaluation framework ...... 89 4.1.1 Bases for the framework ...... 89 4.1.2 Compiling the framework ...... 93 4.1.3 Description of the indicators ...... 94 4.1.4 Assessment of the so far found indicators ...... 96 4.2 International case studies ...... 103 4.2.1 Ireland ...... 104 4.2.2 United Kingdom ...... 109 4.2.3 Australia ...... 110 4.2.4 Canada ...... 111 4.2.5 United States of America ...... 112 4.2.6 Germany ...... 114 4.3 Conclusions ...... 115
5 Selected implementation aspects of an interoperable architecture 117 5.1 Lessons learned from (M)SDIs ...... 118 5.1.1 Use of RM-ODP ...... 118 5.1.2 Use of UML ...... 118 5.1.3 Architectural aspects found in other infrastructures . . . . 120 5.1.4 Resulting requirements to construct an architecture . . . . 120 5.2 Reference model ...... 121 5.2.1 Composition ...... 121 5.2.2 Exemplary scenario ...... 126 5.3 Analysis of existing data sets and services ...... 126 5.3.1 Creation of a database schema ...... 129
viii 5.3.2 Registration of data sets and services ...... 130 5.3.3 Presentation of data sets and services ...... 132 5.4 Evaluation of MDI-DE services ...... 133 5.4.1 Conformity with INSPIRE and OGC ...... 134 5.4.2 Performance and availability ...... 145 5.5 Visualization of Service Status Checker monitoring results . . . . 152 5.5.1 SSC API and results ...... 153 5.5.2 Same Origin Policy Problem ...... 155 5.5.3 Creating Diagrams with Flot ...... 156 5.5.4 HTML5
6 Future prospects 173
Appendices 177 A Selected listings ...... 177 A.1 Analysis of existing data sets ...... 177 A.2 SSCVisualizer ...... 187 A.3 JSKOSify ...... 193 B In-depth evaluations of MSDIs ...... 199 B.1 Australia ...... 199 B.2 Canada ...... 205 B.3 USA ...... 214 B.4 United Kingdom ...... 217 B.5 Germany ...... 220 C Theses ...... 225 D List of own publications ...... 229
List of Tables 231
List of Figures 233
List of Listings 237
Index 239
Bibliography 241
ix x List of abbreviations
ACZISC Atlantic Coastal Zone Information Steering Committee (Canada) AJAX Asynchronous JavaScript and XML AMSIS Australian Marine Spatial Information System ANZLIC Australia New Zealand Land Information Council AODN Australian Ocean Data Network API Application Programming Interface ArcIMS Esri Arc Internet Map Server ASDI Australian Spatial Data Infrastructure
BAW Federal Waterways Engineering and Research Institute (Bundesanstalt fur¨ Wasserbau) BfG Federal Institute of Hydrology Germany (Bundesanstalt fur¨ Gewasserkunde)¨ BfN Federal Agency for Nature Conservation (Bundesamt fur¨ Naturschutz) BODC British Oceanographic Data Centre BSH Federal Maritime and Hydrographic Agency (Bundesamt fur¨ Seeschifffahrt und Hydrographie)
CABIN Canadian Aquatic Biomonitoring Network CAMRA Coastal and Marine Resource Atlas (UK) CARIS Computer Aided Resource Information System (Marine Cartography and Geodesy Tools, Canada) CGDI Canadian Geospatial Data Infrastructure CHS Canadian Hydrographic Service CMRC Coastal & Marine Resources Centre at University College Cork CMSP Coastal and Marine Spatial Planning Data Registry (USA) COIN Coastal and Ocean Information Network COINAtlantic Coastal and Ocean Information Network for Atlantic Canada COSYNA Coastal Observing System for Northern and Arctic Seas CSDGM Content Standard for Digital Geospatial Metadata CSDI Coastal Spatial Data Infrastructure CSIRO Commonwealth Scientific and Industrial Research Organisation (Australia)
xi LIST OF ABBREVIATIONS
CSV Comma-separated values
DBMS Database management system DCENR Department of Communications, Energy and Natural Resources (Ireland) Defra Department for Environment, Food and Rural Affairs (UK) DEM Digital elevation model DERM Digital Earth Reference Model DFO (Department of) Fisheries and Oceans Canada DOM Document Object Model
EAF Ecosystem Approach to Fisheries EBF Ecosystem Based Management EEZ Exclusive economic zone ENC Electronic navigational chart EPA Environmental Protection Agency EPSG European Petroleum Survey Group EU European Union
FFH Fauna and flora directive (Fauna-Flora-Habitat-Richtlinie) FGDC Federal Geographic Data Committee FOSS Free and Open-Source Software
GCMD Global Change Master Directory GDI-DE Spatial data infrastructure for Germany GDP GeoConnections Discovery Portal (Canada) GEMET GEneral Multilingual Environmental Thesaurus GeoConnections A nation-wide program of the federal Department of Natural Resources, Canada GMES Global Monitoring for Environment and Security (now Copernicus) GML Geography Markup Language GOOS Global Ocean Observing System GUI Graphical user interface
HELCOM Baltic Marine Environment Protection Commission (Helsinki Commission) HydroML Hydrologic Markup Language
ICM Integrated Coastal Management ICOM Integrated Coastal and Ocean Management ICZM Integrated Coastal Zone Management IHO International Hydrographic Organisation
xii LIST OF ABBREVIATIONS
IMOS Integrated Marine Observing System INSPIRE Infrastructure for Spatial Information in the European Com- munity IOC Intergovernmental Oceanographic Commission IODE International Oceanographic Data and Information Exchange IOOS Integrated Ocean Observing System IRLOGI Irish Organisation for Geographic Information ISO International Organization for Standardization
JDOM (not an acronym) Java API for XML documents JSON JavaScript Object Notation JSONP JSON with Padding
LANIS Landscape and Nature Protection Information System (Landschafts- und Naturschutzinformationssystem) LHM LANIS Habitat Mare LKN Agency for Coastal Defence, National Park and Marine Conservation (Landesbetrieb Kustenschutz,¨ Nationalpark und Meeresschutz) LUNG Agency for the Environment, Nature Conservation and Ge- ology (Landesamt fur¨ Umwelt, Naturschutz und Geologie)
MAGIC Multi-Agency Geographic Information for the Countryside (UK) MDI-DE German marine spatial data infrastructure (Marine Daten- infrastuktur Deutschland) MEDIN Marine Environmental Data and Information Network (UK) MGDI Marine Geospatial Data Infrastructure (Canada) MMI Marine Metadata Interoperability Project MSDI Marine Spatial Data Infrastructure MSFD Marine Strategy Framework Directive
Natura2000 Ecological network of protected areas (EU) NCRIS National Collaborative Research Infrastructure Strategy (Australia) NOAA National Ocean and Atmospheric Administration (USA) NOKIS North-Baltic-Sea-Coastal-Information-System (Nord-Ostsee- Kusten-Informations-System)¨ NRC National Research Council Canada
OASIS Organization for the Advancement of Structured Informa- tion Standards
xiii LIST OF ABBREVIATIONS
OGC Open Geospatial Consortium OSM OpenStreetMap OSPAR Convention for the Protection of the Marine Environment of the North-East Atlantic (Oslo-Paris-Convention) OWL Web Ontology Language OWS OGC Web Service
PPMCC Pearson product-moment correlation coefficient
QoS Quality of Service
RDF Resource Description Framework RM-ODP Reference Model of Open Distributed Processing
SDI Spatial Data Infrastructure SEIS Shared Environmental Information System SKOS Simple Knowledge Organisation System SOA Service-oriented architecture SQL Structured Query Language SSC (FGDC) Service Status Checker
UBA Federal Environmental Agency (Umweltbundesamt) UML Unified Modeling Language UMTHES Environmental-(Umwelt-)Thesaurus by the UBA UNCLOS United Nations Convention on the Law of the Sea
VS-RL Birds Directive (Vogelschutzrichtlinie)
W3C World Wide Web Consortium WFD Water Framework Directive WFS Web Feature Service WISE Water Information System for Europe WMS Web Map Service WPS Web Processing Service WRON Water Resources Observation Network (Australia) WSDL Web Services Description Language
XHR XMLHttpRequest XML Extensible Markup Language XSLT Extensible Stylesheet Language Transformations
xiv 1 Introduction
“Humankind is extremely reliant on the oceans, as a source of food and raw materials, as a climate regulator, for transportation, for disposal of waste products, and for recreation.” (Strain et al., 2006, p. 431)
Marine environments are very important to mankind because resources can be exploited, habitats can be found and industries can produce goods that can then be shipped over the seas. Furthermore especially coastal environments are valued living spaces and recreational areas. These are reasons why “[. . .] half the world’s population lives within 60 km of the shoreline [. . .] ”1. However, the negative side to this are “[. . .] environmental modification and deterioration through landfill, dredging, and pollution caused by urban, industrial, aquaculture and agricultural activities.”2 This means that the marine environment is in danger and foresighted management and actions are needed. Such actions are derived inter alia from directives that demand continuous monitoring efforts and periodic reports. A spatial data infrastructure (SDI) can support the fulfillment of directives’ requirements. It can help administrative officers and scientists to find data they need, publish data so that other users are able to use it and prepare reports that reflect the state of marine environments. Decisions can be made by politicians, environmental agencies and so on based on these reports and the data a SDI makes available. Furthermore a SDI is an instrument to inform the public.
1.1 Motivation
Germany, in contrast to other countries, did not have a marine spatial data infrastructure (MSDI) until 2014. To make data access easier and merge infor- mation concerning different topics – such as coastal engineering, hydrography and surveying, protection of the marine environment, maritime conservation, regional planning and coastal research – the Federal Ministry of Education and Research (BMBF) funded the project MDI-DE3 in order to develop a MSDI for Germany. Easier data access should support institutions and authorities in their
1(Bartlett et al., 2004, p. 2) 2(Bartlett et al., 2004, pp. 2) 3www.mdi-de.org
1 1 Introduction daily work because it is easier for employees and/or scientists to find the data they need. Apart from easier data access and merging information through a central geoportal reporting to specific marine directives is an relevant aspect and will be even more important in the future when the directives are implemented and require data and reports on a specific time cycle. On the European level Germany has to report to the INSPIRE4 (Infrastructure for Spatial Information in the European Community) directive as well as the Marine Strategy Framework Directive5 (MSFD), the Water Framework Directive6 (WFD) and Natura2000 with their regulation counterparts for Germany and its federal states (Meeresstrategie- Rahmenrichtlinie [MSRL], Wasserrahmenrichtlinie [WRRL], Fauna-Flora-Habitat- Richtlinie [FFH-RL], Vogelschutzrichtlinie [VS-RL]). A central geoportal will help to comply with the reporting requirements because of its standardization respec- tively harmonization and the easy and centralized data access. Furthermore as the development of a MSDI brings together marine experts it can be defined and implemented how to report to the directives (e.g. data harmonization is needed so that biological and chemical parameters are represented in a comparable way). Data harmonization does also require metadata harmonization. Not only for data and metadata but for many aspects terms are of great importance. Terms are created and merged into thesauri by marine experts so that terms are clear and non-ambiguous. Since Germany has not had a MSDI so far there are several marine thesauri – developed by several institutions – in existence which means that terms may be included several times and that definitions may vary and so on.
1.2 Objectives
The previous section stated aspects of what is missing respectively aspects that are worth working on. The objectives presented in this section derive from these aspects and implement the motivated aspects. It has to be noted that this thesis can only implement respectively design selected aspects of an (M)SDI.
MSDI from scratch Building an initial MSDI in Germany is challenging as well as promising. On the one hand it is challenging because of all the coordination and effort that have to be put into such a development. On the other hand it is promising because standards have matured, state-of-the-art technologies can be used and because
4http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:32007L0002:EN:NOT 5http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:32008L0056:EN:NOT 6http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:32000L0060:EN:NOT
2 1.2 Objectives other countries already have been working on MSDIs as well. Therefore the first step is to look for pre-existing MSDIs enabling learning from them. The second step is to analyze them. However, there is an intermediate step between these two because firstly it has to be defined how to analyze them to be able to compare them which might make learning from them easier. The final step is to extract potential pitfalls on the one hand and advantages respectively good ideas otherwise.
Easier and central data access Apart from learning from other infrastructure initiatives an early step in (M)SDI development is to become clear of what is already there. A SDI brings together many actors (authorities, institutions etc.) and therefore much data and metadata (stored in so called infrastructure nodes). Before the development begins all the data and metadata sets have to be known to see how and if they fit into the infrastructure. If the data and metadata is not already available through services an early step in SDI development is setting up services so that files do not have to be transferred and that the most up-to-date version of data and metadata is available from one source every stakeholder has access to.
Reporting After the data is made available easily to every stakeholder through services the availability of services enables actors to use the data, for instance, to comply with reporting requirements. For this usage on the one hand aspects like data modelling (i.e. what the data have to look like) and other formal requirements concerning data and metadata are important. On the other hand – if data have to be published respectively transmitted via services – there also are requirements concerning service quality because not only there are user expectations regarding availability and performance but also reports that have to be prepared based on the services which means that the services respond in an acceptable time frame and are permanently available.
Thesauri harmonization For reporting – but also for other tasks such as metadata annotation and search – terms are important because they have to be clearly defined so that everybody can be certain what is meant with a specific term. These terms come from thesauri. The first task is to identify the thesauri handling marine terms in Germany. If there is more than one thesaurus in existence the thesauri have to be harmonized so that terms are not defined multiple times, especially that terms are not defined differently. To be able to use thesauri – or the harmonized thesaurus – for other tasks and in particular to ease the harmonization process (through the use of an editorial system) a web-based thesauri management system is used into which the thesauri are imported first.
3 1 Introduction
1.3 Outlook on the thesis
Figure 1.1 gives an overview of this thesis and shows how the chapters and sections are related to one another.
Chapter 1: Motivation Sec. 1.1 Introduction Objectives Sec. 1.2
Chapter 2: Fundamentals, basic concepts and standards
Spatial Data Infrastructures Sec. 2.1
Geospatial standards Sec. 2.2
Standards for reference models Sec. 2.3
Standards for knowledge representation Sec. 2.4
Directives in the marine and SDI domain Sec. 2.5
Chapter 3: Existing approaches and established Chapter 5: Selected implementation aspects of an systems interoperable architecture
Germany: MDI-DE Sec. 3.1 Analysis of existing data sets Sec. 5.3
International MSDIs Sec. 3.2 Lessons learned from other (M)SDIs Sec. 5.1
Reference models for SDIs Sec. 3.3 Reference model Sec. 5.2
Existing marine vocabularies Sec. 3.4 Requirements for a marine thesaurus Sec. 5.6
SKOS Tools Sec. 3.5 Converting vocabularies to SKOS Sec. 5.7
Tools to evaluate performance and Evaluation of MDI-DE services Sec. 3.6 Sec. 5.4 conformity of services Visualization of SSC monitoring results Sec. 5.5
Chapter 4: Evaluation of existing MSDIs
Building an evaluation framework Sec. 4.1 Chapter 6: Future prospects International case studies Sec. 4.2
Appendix B: In-depth evaluations of Appendix A: Selected listings MSDIs Analysis of existing data sets Sec. A.1 Australia Sec. B.1 UK Sec. B.4
SSCVisualizer Sec. A.2 Canada Sec. B.2 Germany Sec. B.5 JSKOSify Sec. A.3 USA Sec. B.3
Figure 1.1: Chapters of this thesis and their relationships
4 1.3 Outlook on the thesis
Chapter 2 – Fundamentals, basic concepts and standards Based on the moti- vation of this thesis and its objectives there is a range of technologies, standards and concepts that are needed to comply with the objectives. Because MDI-DE is another spatial data infrastructure (SDI) it is important to know its components, what interoperability means and what the unique features of marine SDIs (in comparison to terrestrial SDIs) are. Since SDIs are based on services and would not be possible without these standards for services are stated in conjunction with standards for the underlying data and metadata. MDI-DE is a development from scratch based on these standards. That means that a reference model could be used to aid and support the development of MDI-DE which is why standards for reference models are also presented in this chapter. Lastly, directives play an important role and are an important driver for SDI development – especially in the marine domain. Thus the most important directives are stated at the end of this chapter.
Chapter 3 – Existing approaches and established systems Using technology and standards of the preceding chapter systems and software tools were built that are described in this chapter. Firstly MDI-DE is introduced as this was developed together with this thesis and is a completed project now. After that other (international) MSDI approaches are presented. Furthermore existing reference models are described. Thesauri and controlled vocabulary are an aspect of SDIs and spatial data which is why existing vocabularies and tools to convert and present vocabularies on the web are depicted. The chapter closes with an overview over tools to evaluate performance and conformity of services because services are an integral part of SDIs.
Chapter 4 – Evaluation of existing MSDIs Firstly this chapter builds a frame- work that enables (to a certain degree) objective evaluation of SDIs. Evaluation of other existing SDIs (in contrast to MDI-DE which is built from scratch) is im- portant because this makes them comparable and highlights potential pitfalls as well as aspects and concepts worth incorporating into the own approach. These pros and cons are elaborated in the following section that evaluates existing MSDIs.
Chapter 5 – Selected implementation aspects of an interoperable architecture This chapter represents the synthesis of the findings so far that are used as bases for further implementations of selected aspects. Firstly it communicates the lessons that can be learned from other (M)SDIs and builds a reference model for MDI-DE based on these findings. Because the reference model lists all the actors of MDI-DE which have data sets and more importantly services available an overview was needed at the beginning of the project. This overview was achieved through web forms and tables. After the existing services are known
5 1 Introduction and after additional ones were set up based on existing data sets performance and conformity of the services plays an important role which is why the services are evaluated afterwards. The service evaluation showed that there is the need to visualize results of the Service Status Checker to simplify evaluation of services with it. The prerequisite to set up services are data and according metadata sets. Especially for metadata annotation thesauri are important but also for services (e.g. keywords) and the MDI-DE portal itself (e.g. search function). The next-to- last section formulates requirements to build a marine thesaurus that supports the functionalities just mentioned. These requirements form the base for the actual implementation of a marine thesaurus which is the last implementation of this chapter.
Chapter 6 – Future prospects The last chapter provides an outlook on what MSDIs will look like in the future. It also details what additional features can be implemented and how certain aspects of this thesis can be improved in the future.
Appendices – Selected listings & in-depth evaluations of MSDIs Appen- dix A provides listings of the implementations SSCVisualizer and JSKOSify as well as the forms and tables to analyse the existing data sets and services. Ap- pendix B documents the lengthy evaluations of the MSDIs of Australia, Canada, the UK and the USA as well as a sort of self-assessment with Germany.
6 2 Fundamentals, basic concepts and standards
“Interoperability among components of large-scale, distributed systems is the ability to exchange services and data with one another.” (Heiler, 1995, p. 271)
Interoperability is the base for SDI development, i.e. makes it possible. (Staub, 2009, p. 20) states that interoperability has technical as well as organizational aspects which are depicted in figure 2.1. The figure also shows five elements that characterize interoperability and that make up the foundations for this thesis and that will be discussed in this chapter:
[1] Directives and laws [4] Data transfer/Services [2] Standards and norms [5] Semantic transformation [3] Profiles/Data modelling
Directives and laws
organizational Standards and Spatial Data norms Infrastructures Interoperability
Profiles, data Semantic modelling transformation technical
Data transfer, services
Figure 2.1: Interoperability (modified after (Staub, 2009, p. 20))
Chapter 1 already expressed the goal that several infrastructure nodes will be set up for the project which means that the participants retain control over and
7 2 Fundamentals, basic concepts and standards responsibility for their own data sets. An infrastructure with spatial data and a network of distributed nodes is called a spatial data infrastructure (SDI) and if the data sets handle spatial information in the marine domain it is called a marine (spatial) data infrastructure (M[S]DI). Both terms are explained in-depth in section 2.1. All these nodes use web services ([4] data transfer/services) so that the data owners do not have to push data files back and forth trying to keep track of which the current and most up-to-date version is. To make a SDI work the web services have to be able to communicate with each other. To achieve this web services have to be based on standards ([2] standards and norms) and these standards in the spatial domain are explained in section 2.2. Now that we have web services, standards for web services and nodes which are relying on web services, we must consider the architecture to compose the network of nodes. Because of all these services, it is certain that it will be or is a service-oriented architecture (also defined in section 2.2). This can be modelled with the help of a reference model based on the ISO standard Reference Model of Open Distributed Processing (RM-ODP) which is explained in section 2.3. There are several European directives ([1] directives and laws) in the marine and SDI domain which affect many of the participants of the infrastructures mainly through reports with iterations over specified time spans and thus these directives will be discussed in detail in section 2.5. So far data was referenced indirectly in almost every aspect addressed. But with data comes – or should come – metadata which describe the data and make it discoverable. But what if somebody wrote “caost” instead of “coast” as keyword for a metadata set? Nobody would be able to find this dataset by the keyword “coast”. But maybe somebody who is interested in coastal data is also interested in data about beaches but cannot find it when searching with the keyword “coast”. So what is needed besides data and metadata is knowledge representation ([3] profiles/data modelling and [5] semantic transformation) assuring that only words from a keyword list can be picked and connecting terms with other terms with a similar meaning. Important standards and approaches such as SKOS, RDF and ontologies are discussed in section 2.4.
2.1 Spatial Data Infrastructures
Spatial data infrastructures (SDIs) are about separated systems/nodes working together and interoperability is the ability of systems and/or organizations to work together. This means that interoperability is a prerequisite for SDI development which is why interoperability is discussed firstly (subsection 2.1.1). After that the term spatial data infrastructure will be defined (subsection 2.1.2) and from this definition components will be derived (subsection 2.1.3). This section closes with a classification of SDIs showing the broad fields a SDI can be
8 2.1 Spatial Data Infrastructures applied to (subsection 2.1.4) with marine SDIs being picked as an example for a thematic SDI (subsection 2.1.5).
2.1.1 Interoperability
As already mentioned in the introduction of this chapter it is still a problem for many data owners that on the one hand they have files in various (spatial) formats causing incompatibility issues and on the other hand the files are stored on the workstations of the employees so that it is hard to keep track of the most up-to-date version of the file or the whereabouts of files and data in general. That is why interoperability is needed which is defined by (Heiler, 1995, p. 271) as “[. . .] the ability [of systems] to exchange services and data with one another. It is based on agreements between requesters and providers on, for example, message passing protocols, procedure names, error codes, and argument types.” As stated in the introduction of this chapter interoperability is divided into the two main areas organizational and technical interoperability1. Within technical interoperability the two characteristics semantic and syntactic interoperability are found. (Najar, 2006, p. 61) states that “semantic Interoperability is a special kind of interoperability which provides systems with the ability of access, consistently and coherently, to similar (though autonomously defined and managed) classes of digital data, objects and services distributed across heterogeneous repositories.” Since this definition is rather complex and specific the definition of (Kresse et al., 2012, p. 407) will be stated, too, which is more general and focuses on the user: “Semantic interoperability is defined as the ability of a user to fully under- stand the data received in a data exchange in order to be able to make full use of those data if needed.” (Danko, 2008, p. 657) simplifies semantic interoperability even more by stating that this is about “[. . .] understanding the same term for the same concept.” This is the definition that will be used for the rest of this thesis (e.g. in section 2.4). As the second technical characteristic of interoperability syntactic interoperabil- ity “[. . .] allows the interoperable use of available data through a standard interface (through OGC web services). This interface is accessed through
1cf. (Staub, 2009, p. 20)
9 2 Fundamentals, basic concepts and standards
a standardized protocol and returns the information in a standard format. Query and delivery of data occurs in the structure of the provider model. The data structure of available data cannot be influenced by the user.”2 Standards are the base of a SDI which (Toth´ et al., 2012, p. 20 and p. 22) underlines by stating: “Interoperability arrangements and data harmonisation in SDIs aim to eliminate incompatibility and inconsistency of data, thereby exempting the users from having to undertake onerous data manipulations before they start using data in their applications. [. . .] The interoperability in an SDI means that users are able to integrate spatial data from disparate sources “without repetitive manual intervention”, i.e. the datasets they retrieve from the infrastructure follow a common structure and shared semantics.” This emphasises that both the problems of technical (not thematic) incompatibil- ity and inconsistency are ruled out by an SDI which is based on interoperable web services which are explained in section 2.2.
2.1.2 SDI definition
To define what a spatial data infrastructure is we already got some initial points from the citations of (Toth´ et al., 2012) in subsection 2.1.1. In addition to that (Toth´ et al., 2012, p. 21) states: “SDIs should encompass the common spatial aspects constituting a generic location context for a wide variety of applications.” Therefore we end up with three basic items to get to a definition of the term spatial data infrastructure which are all about the access and sharing an SDI incorporates: • No data manipulations needed by users • Datasets follow a common structure and shared semantics and are retriev- able through the infrastructure • SDIs should include common spatial aspects to offer a generic location context for a wide variety of applications This, however, is just a very small excerpt to understand what an SDI is and what constitutes it. (McGranaghan, 2003) citing (Groot & McLaughlin, 2000) gives a broader but also very concise definition of the term SDI: “Geospatial Data Infrastructure encompasses the networked geospatial databases and data handling facilities, the complex of institutional, organiza- tional, technological, human, and economic resources which interact with
2cf. (Staub, 2009, p. 25) (translated)
10 2.1 Spatial Data Infrastructures
one another and underpin the design, implementation, and maintenance of mechanisms facilitating the sharing, access to, and responsible use of geospatial data at an affordable cost for a specific application domain or enterprise.” We again see the access and sharing of data aspect but also a wide range of other aspects such as: • Technology: networked geospatial databases and data handling facilities • Organisation/Policy: complex of institutional, organizational, technologi- cal, human, and economic resources • Cost: affordable cost • People/Users: specific application domain or enterprise
2.1.3 Components of an SDI
As can be seen, the aspects are already categorized which suggests the possible components of an SDI. For two of the aspects the GSDI Cookbook (Nebert, 2004, p. 8) is giving more in-depth information in its comprehensive definition of the term. Firstly it describes the beforehand mentioned users of an SDI more precisely as “[. . .] users and providers within all levels of government, the commercial sector, the non-profit sector, academia and by citizens in general” and secondly it specifies the organizational or policy aspect in the sense that an SDI “[. . .] must also include the organisational agreements needed to coordinate and adminster it on a local, regional, national, and or transnational scale.” The findings so far align with the components of an SDI found in (Rajabifard. & Williamson, 2001, pp. 4) and which can be seen in figure 2.2: people, data, access network, policy and standards. Everything but standards and access networks were mentioned more or less directly but since access and sharing of data needs standards as well as access networks these two components were already implicitly included in the definitions so far.
Standards
People Policy Data
Access Network
Figure 2.2: Components of an SDI (modified after (Rajabifard. & Williamson, 2001))
11 2 Fundamentals, basic concepts and standards
But the GSDI Cookbook (Nebert, 2004, p. 8) is digging deeper into the compo- nents of an SDI as it did with the other aspects mentioned. It lists the components as: • metadata (geographic data and attributes, sufficient documentation), • catalogues and web mapping (discovery, visualization, evaluation), • access and • additional services for data application. This leads to an extended view of the components of an SDI and results in the refined figure 2.3.
Standards
Policy Metadata
People Access Data Network
Applications
Figure 2.3: Components of an SDI expanded with aspects of (Nebert, 2004) (modified after (Rajabifard. & Williamson, 2001))
2.1.4 Classification of SDIs
(Rajabifard. & Williamson, 2001) also point out that there are SDIs at different political-administrative levels which make hierarchies the first form of classifi- cation. Figure 2.4 illustrates these levels and shows that there are vertical and horizontal relationships between the levels. (Rajabifard. & Williamson, 2001), however, only mention relationships on a horizontal level but do not show them in their figure. (Bernard et al., 2005, p. 7) is extending the original figure with these horizontal relationships. The vertical relationships represent that a local SDI delivers data to a state SDI which is composed of many local SDIs. That is the part where the state SDI is facing down but it also has to face up because it has to deliver data to the SDI above it – the national SDI. In (Bill, 2010) examples for the different levels can be found making the levels and their relationships easier to understand: • global SDI – Global Spatial Data Infrastructure • regional SDI – INSPIRE • national SDI – SDI for Germany (GDI-DE)
12 2.1 Spatial Data Infrastructures
• state SDI – SDI for Mecklenburg-Vorpommern (GDI-MV) • local SDI – GeoPortal of the city of Rostock
Glloball SDII
Regiionall SDII
Nattiionall SDII
Sttatte SDII
Locall SDII
Corrporratte SDII
Figure 2.4: SDI hierarchy with vertical and horizontal relationships (modified after (Rajabifard. & Williamson, 2001) and (Bernard et al., 2005))
The hierarchical levels affect all of the components that were pointed out before. For instance when the national SDI is using a certain standard for its data or data access the state SDI below is very likely to use this standard as well. This is equally true for the horizontal relationships because it is more likely that two states will use the same standards rather than that a state is interested in using standards from a local SDI (Bernard et al., 2005, p. 7). In Europe (regional SDI) this is answered through the INSPIRE directive that specifies guidelines for the national SDIs (e.g. GDI-DE in Germany). Another approach to classify SDIs is to look at their thematic scope. While there are many SDIs for specific data coverages such as urban planning respectively sustainable land management (Groot, 1997) or archaeological and built heritage (McKeague et al., 2012) which do not have specific identification respectively a name of their own we find the term Environmental Spatial Data Infrastructure for example in (Fabian, 2003). Although it has to be stated that this term does not seem to be widely used for SDIs handling environmental data. In Germany, for instance, there is the PortalU which is an SDI for environmental data. However, in the marine domain a term evolved to describe these SDIs – Marine Data Infrastructures (MDI) or Marine Spatial Data Infrastructures (MSDI) or Marine Geospatial Data Infrastructures (MGDI, used in Canada). In the domain of Integrated Coastal Zone Management (ICZM) the term Coastal Spatial Data Infrastructure (CSDI) is often used, too. The widespread use of these terms can
13 2 Fundamentals, basic concepts and standards be seen in section 3.2 where different approaches on implementing MSDIs in many countries including Australia, Canada and Ireland are presented.
2.1.5 Marine SDIs
The term MSDI dates back to at least 2001 (Vessie et al., 2001) but most probably was used long before with the CoastGIS conference series’ beginning in 1995. According to (Strain, 2006), MSDIs are about the exchange and sharing of spatial data like SDIs with the significant difference that SDIs are primarily focused on land-related data, while MSDIs are aiming at improved access to marine themed data to advance marine and coastal zone administration and management. Figure 2.5 shows some of the activities marine and coastal zone administration involves and which an MSDI has to cover.
Marine marine Industries planning and policies management
Policing and Marine Resource conflict Administration Management resolution
legislation and Marine institutional conventions Protected framework Areas Figure 2.5: Marine Administration (modified after (Strain et al., 2006))
(Russell, 2009) is giving a quite comprehensive definition of the term MSDI with stating that an MSDI is “[. . .] the component of a National SDI that encompasses marine and coastal geographic and business information in its widest sense. An MSDI would typically include information on seabed bathymetry (elevation), geology, infrastructure (e.g. wrecks, offshore installations, pipelines, cables); adminis- trative and legal boundaries, areas of conservation and marine habitats and oceanography.” (Bartlett et al., 2004, p. 6) is also arguing that it “[. . .] is not possible to develop a coastal SDI in isolation from the broader national or regional SDI (NSDI)” and that a “[. . .] CSDI will typically be a subset of a more comprehensive NSDI because the coastal zone covers multiple physical and institutional spaces included in the generic NSDI.”
14 2.1 Spatial Data Infrastructures
But it has to be indicated that an MSDI is not in all cases a component of a National SDI because (Strain, 2006) is also stating examples for MSDIs on a regional and global level. While not mentioning a coherent example for a regional MSDI she lists two global MSDI initiatives: Global Oceans Observing System (GOOS) and Oceans 21. An example for a regional MSDI (although not calling itself SDI or MSDI) is the Oregon Coastal Atlas3 for instance. Now with the classification into the hierarchical system (global, national, re- gional) the other aspect outlined in (Rajabifard. & Williamson, 2001) – the components of a SDI (data, standards, policies, access networks and people) – has to be examined and checked for its applicability to the marine environ- ment. Generic standards for services like the ones by the OGC of course also apply to the marine domain. However, because the ISO TC/211 (see subsection 2.2.1 on page 19) is mostly focused on terrestrial spatial data standards for marine (meta)data are needed. Coordination is important to build standards. This is why the Intergovernmental Oceanographic Commission (IOC) was established in 1960 because4 “its mission is to promote international cooperation and to coordinate programmes in research, services and capacity building to learn more about the nature and resources of the oceans and coastal areas, and to apply this knowledge to improved management, sustainable development and protection of the marine environment and the decision making processes of States.” In order to achieve this the IOC established the International Oceanographic Data and Information Exchange (IODE) in 1961. IODE facilitates5 “[. . .] the exploitation, development, and exchange of oceanographic data and information between participating Member States and by meeting the needs of users for data and information products.” While the International Hydrographic Organisation (IHO) and the International Hydrographic Bureau developed a standard for hydrographic data (S-57). (Strain et al., 2006) approached the issue of standards for MSDIs, too, and state that the standard marineXML has been developed by the International Oceans Commission (IOC). However, this effort seems to be discontinued but may be still in use in Australia only. Furthermore there is the Hydrologic Markup Language (HydroML) which allows “[. . .] the definition of hydrologic information”6 and XHdyro which “[. . .] is an XML format for inter-departmental and cost-efficient time-series data exchange”7 developed by the German Federal Institute of Hydrology (BfG). (Strain et al., 2006) also state that Policies are covering access, data custodian- ship, conformity, quality, content, industry engagement, avoidance of duplication and sensitivity. Except for data quality, data access and privacy all these fields are the same as for terrestrial (land-based) spatial data when applied to the ma-
3http://www.coastalatlas.net/ 4http://ioc-unesco.org/index.php?option=com content&view=article&id=14:about-the-ioc 5http://www.iode.org/index.php?option=com content&view=article&id=385&Itemid=34 6http://water.usgs.gov/XML/NWIS/nwis hml.htm 7http://www.xhydro.de/index en.html
15 2 Fundamentals, basic concepts and standards rine domain. Data quality may be more difficult to achieve due to the complexity of the marine environment (complex measurements and processes). While data access is no problem onshore because of fixed line data transfer for offshore usage wireless data transfer may be needed which could be problematic. Be- cause countries are reluctant to share spatial information relating to their marine jurisdictions different privacy policies for offshore data may be needed. The issue of offshore data is also the only difference when it comes to access networks comparing terrestrial and marine data because the technology that is used for data transfer and access on land is not appropriate for offshore use. Examples for access networks in the marine domain include inter alia the Global Ocean Observing System (GOOS) managed by the IOC which provides8 “[. . .] a coordinated approach to deployment of observation technologies, rapid and universal dissemination of data flows and delivery of marine information to inform and aid marine management and decision makers and to increase the appreciation of the general public of our changeable oceans.” Further examples are the Integrated Ocean Observing System (IOOS) and the Global Monitoring for Environment and Security (GMES, now Copernicus9) initiative. The importance of people in the marine domain is just like it is in the terres- trial domain. “The key to success in SDI initiatives are partnerships within and between organisations involved in marine administration and spatial information.”(Strain et al., 2006) (Strain et al., 2006) also state that data collection and updating is more difficult in the marine environment because it is dynamic and multi-dimensional (to a greater extent than land-based spatial data). It is also pointed out that there are two key issues when it comes to data which are the same that apply to SDIs: availability and interoperability. The source is also listing “fundamental datasets” for MSDIs:
• cadastral • marine protected areas • address • oceanography • transport • sea level • administrative & political bound- • waves aries • water quality • elevation • sea floor composition • hydrography • meteorological conditions • imagery • biodiversity regionalization • bathymetry
To see what further data sets are of interest the INSPIRE directive (in-depth description in subsection 2.5.1 on page 44) is used. (Korduan, 2013) analyses the coverage of marine data within the INSPIRE directive. He states that there are
8https://en.unesco.org/node/119895 9www.copernicus.eu
16 2.1 Spatial Data Infrastructures
19 themes important for the marine domain of which the most important ones are: (1) Oceanographic Geographical Features (OF, e.g. sea surface temperature, currents, wave heights or salinity) (Millard et al., 2013a) (2) Land Use (LU, use and functions of a territory, e.g. 1 4 AquacultureAnd- Fishing) (Salge´ et al., 2013) (3) Energy Resources (ER, offshore wind parks, energy derived from tidal move- ment, wave motion or ocean current) (Tuchyna et al., 2013) (4) Mineral Resources (MR, mineral resources in or on the sea floor) (Serrano et al., 2013) (5) Natural risk zones (NZ, marine related hazard types like floods) (Harrison et al., 2013) (6) Environmental monitoring Facilities (EF, Oceanographic Geographical Fea- tures are derived from Environmental monitoring Facilities) (Daffner et al., 2013) (7) Habitats and biotopes (HB, includes fresh water and marine areas) (Hinter- lang et al., 2013) (8) Bio-geographical regions (BR, “Areas of relatively homogeneous ecological conditions with common characteristics”, e.g. Baltic sea) (Roscher et al., 2013) (9) Sea Regions (SR, “A Sea Region is a defined area of common (physical) characteristics”, e.g. coastline) (Millard et al., 2013b) (10) Area management/restriction/regulation zones and reporting units (AM, “areas managed, regulated or used for reporting”) (Lihteneger et al., 2013) (11) Agricultural and Aquaculture Facilities (AF, e.g. marine and freshwater aquaculture) (Busznyak´ et al., 2013)
Oceanographic Environmental Geographical monitoring Features (OF) Facilities (EF)
Sea Regions (SR)
Area Management Elevation (EL) or Reporting Units (AM)
Geographic Names Hydrography (HY) (GN)
Figure 2.6: Links between selected INSPIRE themes
17 2 Fundamentals, basic concepts and standards
Figure 2.6 shows that some of the themes have relationships with each other and/or other non-marine specific themes. The connections are:
• Oceanographic Geographical Features → Oceanographic Geographical Features are derived from Environmental monitoring Facilities (EF) → Oceanographic Geographic Features always contain information about a Sea Region SR • Sea Regions → Elevation (EL, depth of a Sea Region, not included in the eleven themes of (Korduan, 2013)) → Main Sea Region class (SeaArea) derives from Hydrography (HY) → Geographic Names (GN) are used for the named Sea Regions → Geophysical observations (described by the Oceanographic Geographical Features [OF] theme) are made within Sea Regions → Areas of the sea may be Area Management or Reporting Units (AM) • Area Management or Reporting Units → Areas of the sea (Sea Regions [SR]) may be Area Management or Report- ing Units
Except for imagery all the “fundamental datasets” by (Strain et al., 2006) can be mapped to INSPIRE themes relevant to the marine domain (Korduan, 2013). However, because not all of these data sets are specific to the marine domain they are grouped accordingly and are then mapped to INSPIRE themes (use table 2.1 for reference):
• not marine-specific ◦ cadastral LU ◦ address AD ◦ transport TN ◦ administrative and political boundaries SR ◦ elevation SR/EL ◦ imagery ◦ meteorological conditions NZ ◦ biodiversity regionalization HB and BR • marine-specific ◦ bathymetry SR/EL ◦ hydrography SR/HY ◦ marine protected areas HB ◦ oceanography OF ◦ sea level OF ◦ waves OF ◦ water quality OF ◦ sea floor composition OF
18 2.2 Geospatial standards
Table 2.1: Selected INSPIRE themes and their abbreviations (INSPIRE, 2007)
AF – Agricultural and Aquaculture Facilities AD – Adresses AM – Area Management or Reporting Units EL – Elevation OF – Oceanographic Geographical Features LU – Land Use EF – Environmental monitoring Facilities SR – Sea Regions BR – Bio-geographical regions HY – Hydrography HB – Habitats and biotopes ER – Energy Resources TN – Transport Networks NZ – Natural risk zones GN – Geographic Names MR – Mineral Resources
Concluding this shows that the two main differences of the data components between marine and terrestrial environments are fundamental (marine-specific) datasets and the data collection process. Furthermore scientific data plays a much more prominent role in marine SDIs than it plays in terrestrial SDIs.
2.2 Geospatial standards
SDIs (see section 2.1) rely on standards because they build on web services which – in the SDI world – were specified by the Open Geospatial Consortium (OGC, with standards such as WMS and WFS, see subsection 2.2.3) in conjunction with the efforts of the International Organization for Standardization (ISO, technical committee 211, see subsection 2.2.1). Data and services require metadata to be retrievable and easily accessible. For this reason subsection 2.2.2 examines metadata from its roots to internationally accepted standards.
2.2.1 ISO TC 211 and its 191XX series
National efforts like the Content Standard for Digital Geospatial Metadata (CS- DGM) were on the one hand superseded by and on the other hand incorporated into an international agreement on geospatial metadata standards. In 1994 the ISO formed a technical committee (TC 211) to develop such an international agreement standardizing information related to the spatial domain. ISO/TC 211 is responsible – according to their overview website10 – for “[. . .] standardization in the field of digital geographic information [which] aims to establish a structured set of standards for information concerning objects or phenomena that are directly or indirectly associated with a location relative to the Earth.”
10http://www.isotc211.org/Outreach/Overview/Overview.htm
19 2 Fundamentals, basic concepts and standards
The outcomes of its work are the 191XX series of international standards of which selected ones with relevance to this thesis will be explained further in the next few paragraphs and ISO 19115 will be discussed in subsection 2.2.2.
19119 – Services play an important role in the world of SDIs because SDIs are collections of distributed services. With ISO 19119 the ISO wanted to standardize services by11 (1) providing an abstract framework to allow coordinated development of spe- cific services, (2) enabling interoperable data services through interface standardization, (3) supporting development of a service catalogue through the definition of service metadata, (4) allowing separation of data instances & service instances, (5) enabling use of one provider’s service on another provider’s data and (6) defining an abstract framework which can be implemented in multiple ways. (3) is of special importance for the development of an SDI because in general metadata catalogues assist users searching for spatial data. With service metadata users can search what data a service offers. But ISO 19119 is also specifying metadata about services and not just data enabling users to find services as well. The metadata of services document among others states which requests the service supports, which layers it offers and what coordinate reference systems are used (Muller¨ et al., 2004, p. 126). Subsection 2.2.3 will outline how the OGC built upon this12 standard and developed specifications for the implementation of services like web map service (WMS) and web feature service (WFS).
19156 – Observations and measurements Especially in the marine domain much data originates from sensors which is why ISO 19156 – observations and measurements (O&M) – is of major importance to MSDIs. Although being an ISO standard the OGC was involved in developing O&M implementation specifications. On their website the ISO characterizes O&M in this way13: “ISO 19156:2011 defines a conceptual schema for observations, and for features involved in sampling when making observations. These provide models for the exchange of information describing observation acts and their results, both within and between different scientific and technical communities.” The term conceptual schema is defined by (Castano et al., 1998, p. 290) as a composition of elements and links. Whereat
11cf. (ISO, 2001, p. 4) 12And of course on other ISO standards like 19136 (GML) as well. 13http://www.iso.org/iso/home/store/catalogue tc/catalogue detail.htm?csnumber= 32574&commid=54904
20 2.2 Geospatial standards
“an element abstracts the constructs used in conceptual models to describe classes of real-world objects (e.g., entity, class). A link abstracts the con- structs used in conceptual models to describe relationships between real- world objects due to the aggregation and generalization abstraction mecha- nisms (e.g., relationships, “is-a” links).”
19136 – Geography Markup Language (GML) So far the ISO standards de- fined metadata and conceptual schemes how to store data. However, with GML the ISO also offers a standard on how to store spatial objects. GML was origi- nally developed by the Open Geospatial Consortium (OGC, see subsection 2.2.3). Because it got widely adopted and used the GML specification has been incorpo- rated into ISO’s range of international standards concering spatial data (191XX series). The Encyclopedia of GIS (Raimundo & Chang-Tien, 2008) defines GML as follows: “Geography Markup Language (GML) is an open-source encoding based on the eXtensible Markup Language (XML), and suitable for the representation of geographical objects. Organized as a hierarchy of features, collections, and geometries, among other structures, GML objects are modeled after real-world entities characterized by properties and state.” Furthermore GML is used as an information exchange and storage format for data sharing by defining a schema of how spatial data can be characterized so that systems are able to understand each other. This schema is the framework for the data and has to be distinguished from the actual data which is the case with most XML applications (Raimundo & Chang-Tien, 2008, p. 364).
2.2.2 Metadata standards
Without additional data a river for instance would just be some line geometry and it could not be differentiated from a street. Only with metadata (and categorization and attribution) one can differentiate the two different lines. Just having metadata at all is great when staying in the realm of isolated systems but when it comes to interoperability and systems interacting with each other a standard is needed which defines what metadata have to look like. Predating geospatial data librarians were the first using computers to catalogue their data (i.e. books and other physical media). For interoperability the machine- readable cataloging (MARC) standard evolved. Due to its complexity and largeness the Dublin Metadata Core Element Set (or Dublin Core, for short) was developed in March 1995 which only has 13 data elements. According to (Guptill, 1999, p. 682) it “[. . .] was proposed as the minimum number of metadata elements required to facilitate the discovery of document-like objects in a networked environment
21 2 Fundamentals, basic concepts and standards
such as the Internet.” However, even before Dublin Core was developed the US Federal Geographic Data Committee (FGDC) proposed the Content Standard for Digital Geospatial Metadata (CSDGM) in June 1994. (Guptill, 1999, p. 683) states that “The standard was the first focused effort on specifying the information content of metadata for a set of geospatial data. The standard was developed from the perspective of defining the information required by a prospective user to determine the availability of a set of geospatial data, to determine the fitness of a set of geospatial data for an intended use, to determine the means of accessing the set of geospatial data, and to transfer successfully the set of geospatial data.” A rather new standard for geospatial metadata succeeding the so far mentioned standards is ISO 19115 which14 “[. . .] defines the schema required for describing geographic information and services. It provides information about the identification, the extent, the quality, the spatial and temporal schema, spatial reference, and distribution of digital geographic data.” With its over 400 metadata elements – arranged in packages such as reference system information, metadata extension information, data quality information and content information and which can be mandatory, conditional or optional – ISO 19115 is enabling interoperability. ISO 19139 is built upon these definitions and defines an XML Schema implementation for them (Bartelme, 2005, pp. 380).
2.2.3 OGC specifications
The Open Geospatial Consortium (OGC) was founded in 1994 and was known as the OpenGIS Consortium until 2004. It is an consortium of economic, government and research-based organizations which mission is to advance the development and use of GIS and spatial data by creating common, open standards and specifications that enable interoperability (Lupp, 2008, p. 815). In addition to the formal specification languages such as the Geography Markup Language (GML, see ISO 19136 on page 21) the results of the work of the OGC are primarily OpenGIS Implementation Specifications which define open interfaces and protocols. Products that conform to these specifications ensure interoperability. In the following, specifically two of the OGC Web Services – abbreviated OWS – are discussed namely the Web Map Service (WMS) and Web Feature Service (WPS). Other important specifications include the Web Processing Service (WPS) that executes processes (such as buffer, overlay etc.) with input from another service (such as WFS) as well as the Catalog Service for the Web (CSW) that provides geospatial metadata and search thereof.
14(ISO, 2002b)
22 2.2 Geospatial standards
A Web Map Service (WMS) returns maps of spatially referenced data. The maps are produced dynamically from geographic information. A map is “[. . .] a portrayal of geographic information as a digital image file suitable for display on a computer screen. A map is not the data itself.”15 Maps are typically returned to the user in a raster data format such as PNG, GIF or JPEG but can also be delivered in a vector-based format such as Scalable Vector Graphics (SVG) or Web Computer Graphics Metafile (WebCGM). A WMS is invoked by submitting a request with a special URL. What this URL looks like depends on the desired operation. There are three operations offered by a WMS of which one is optional: • GetCapabilities: Service returns a XML document describing the service. Inter alia metadata about the service such as title, responsible party and so on as well as information about the offered layers (name, supported coordinate reference systems [CRS] etc.) are the output of this operation. • GetMap: Service returns a map with the given geographic and dimensional parameters. • GetFeatureInfo (optional): Service returns information about particular objects (features) from the map. In case of a GetMap request the URL includes parameters indicating which area is mapped (BBOX), the width and height of the output image, the CRS and what data will be depicted on the map (LAYERS). This leads to a request such as this16 http://gdisrv.bsh.de/arcgis/services/CONTIS/Administration? REQUEST=GetMap &SERVICE=WMS &VERSION=1.3.0 &CRS=CRS:84 &BBOX=3.0,53.0,20.0,55.0 &LAYERS=7,6,5,4,3,2,1 &WIDTH=640 &HEIGHT=400 &FORMAT=image/png
The response to this request is depicted in figure 2.7 and shows multiple layers. A layer is a “basic unit of geographic information that may be requested as a map from a server”17 A layer can also be defined as a set consisting of at least one feature. When two
15(de la Beaujardiere, 2006, p. v) 16Taking the WMS “Continental Shelf Information System” by the Federal Maritime and Hydro- graphic Agency (BSH) as an example. 17(de la Beaujardiere, 2006, p. 7)
23 2 Fundamentals, basic concepts and standards or more layers (or maps which can include multiple layers each) sharing the same geographic parameters and output size are combined an overlay can be produced (like the one shown in figure 2.7).
Figure 2.7: Image response of a GetMap request to a WMS
In contrast to a WMS a Web Feature Service (WFS) works with and outputs vector data, i.e. features. According to the WFS Implementation Specification18 a feature is an “abstraction of real world phenomena”. A WFS is chosen over a WMS if geospatial operations will be performed on the data, for instance in order to create a buffer. A WFS goes beyond end user visualization but can be used for this as well (Michaelis & Ames, 2008, p. 1261). The four different WFS types (or conformance classes) are categorized according to the operations they support. The simple WFS (as well as the basic WFS19) implements the operations
• GetCapabilities, • DescribeStoredQueries and • DescribeFeatureType, • GetFeature. • ListStoredQueries,
The GetCapabilities operation is similar to the WMS. While DescribeFea- tureType illustrates the structure of a particular feature GetFeature returns specific features in GML format, i.e. it returns geodata in vector form. The two other operations handle stored query expressions. According to (Vretanos, 2010, p. 30) a query expression “[. . .] is an action that directs a server to search its data store for resources that satisfy some filter expression encoded within the query.” Furthermore a stored query expression
18(Vretanos, 2010, p. 4) 19The two only differ in the way the GetFeature opertation is executed.
24 2.3 Standards for reference models
“[. . .] is a persistent, parameterized, identifiable query expression. A stored query can be repeatedly invoked using its identifier with different values bound to its parameters each time.”20 The operation ListStoredQueries lists the available stored queries and DescribeStoredQueries returns detailed information about stored queries. In addition to these operations a transactional WFS also implements the Transac- tion operation. This operation enables “[. . .] clients [to] create, modify, replace and delete features in the web feature service’s data store.”21 A locking WFS furthermore implements the operation/s GetFeatureWithLock and/or LockFeature. LockFeature is used to ensure serializability in transactions which means that a feature cannot have been altered by another user while it is modified by a user because the user firstly locks the feature. Afterwards the user can request a feature by using the GetFeatureWithLock operation and can safely modify it (Vretanos, 2010) and (Sinha, 2008). When bringing the findings regarding SDI components (subsection 2.1.3 on page 11) and standards together a good overview develops. The components, how they interact and what standards are important for them are shown in figure 2.8.
2.3 Standards for reference models
Section 2.2 described the standards which are the foundations of an SDI. The practical implementation of an SDI may be aided by a model. Because this model is giving the SDI a framework and because it is the foundation of it, it is called a reference model. The are several reference models in existence that are out-dated, such as Purdue Enterprise Reference Architecture (PERA); Process, Organization and Location and Data, Applications and Technology (POLDAT) and the Open-system environ- ment (OSE) reference model (RM). Other reference models fit only specific fields, e.g. business with models such as Workflow Reference Model, Business reference model and to some degree the Business Process Execution Language (BPEL) by the Organization for the Advancement of Structured Information Standards (OASIS). Another OASIS approach is the SOA Reference Model. SOA stands for service-oriented architecture. Because a SDI is a service-oriented architecture this reference model might be of interest. However, because it focuses solely on the architectural aspects and in particular services it does not fit the broad SDI development. A standard for reference models that represents all aspects (with its view- points) of SDI development is the Reference Model of Open Distributed Processing
20(Vretanos, 2010, p. 42) 21(Vretanos, 2010, p. 90)
25 2 Fundamentals, basic concepts and standards
ISO 191XX OGC ISO Feature series CSW Catalog
Metadata Catalog
HTTP, Z39.50
Applications HTTP
HTTP
People HTTP, FTP,CD, ...
Data Intranet Web/Data Services Internet
National / International OGC WMS, WFS, ... GML, KML, ... Content Standards W3C XML
Figure 2.8: Components, their interaction and standards in SDI (inspired by (Nebert & Anthony, 2010, p. 57))
(RM-ODP, see subsection 2.3.1) by the International Organization for Standard- ization (ISO). A similar earlier approach (The “4+1” View Model of Software Architecture) will be discussed in subsection 2.3.2. Because today modelling depends heavily on standardized notation to ensure interoperability the relation- ship between the Unified Modeling Language (UML) and the approaches for reference models is outlined in subsection 2.3.3.
2.3.1 RM-ODP
The purpose of building a reference model is to define a framework which structures large and complex distributed systems for which spatial data in- frastructures are an example (Vallecillo, 2001, p. 2). The base for such efforts can be ISO’s Reference Model of Open Distributed Processing (RM-ODP)22. Because it is a well-defined standard by the ISO RM-ODP was chosen as basis for implementing the interoperable infrastructure for the MSDI of Germany (MDI-DE). Furthermore according to (Hjelmager et al., 2008, p. 3) RM-ODP was already widely adopted as the conceptual base for other reference models like
22see (ISO, 1998b), (ISO, 1996a), (ISO, 1998a) and (ISO, 1996b)
26 2.3 Standards for reference models
ISO standard 19101 (Geographic Information – Reference model (ISO, 2002a)), the OGC Reference Model which states in a deprecated version (Percivall, 2003, p. 3) that RM-ODP is applied in two ways: “1) a way of thinking about architectural issues in terms of fundamental patterns or organizing principles, and 2) a set of guiding concepts and terminology.” and the Geospatial Interoperability Reference Model (G.I.R.M.) by the Federal Geographic Data Committee (FGDC) which is using the computational and information viewpoint of RM-ODP. Viewpoints are the core of RM-ODP enabling to focus on specific parts of an architecture or frame- work. Viewpoints are necessary because different stakeholders or actors have distinct interests in a system. While some aspects are relevant to developers they certainly are not relevant for customers. Taken classes as example – these are interesting for developers but customers are more interested in what the system provides them and not in the technical details such as classes (Staveley, 2011).
enterprise viewpoint
LUNG information BfN LKN computational viewpoint viewpoint
BSH LLUR
BAW GG
... system and environment
engineering technology viewpoint viewpoint
Figure 2.9: RM-ODP’s generic and complementary viewpoints on the system and its environment
Viewpoints As depicted in figure 2.923 there are five generic and comple- mentary viewpoints on the system to be modelled and its environment which (Vallecillo, 2001, p. 3) and (ISO, 2009, p. 5) describe as:
23cf. (ISO, 2009, p. 5) and http://en.wikipedia.org/wiki/File:RM-ODP viewpoints.jpg
27 2 Fundamentals, basic concepts and standards
• enterprise viewpoint – What for? Why? Who? When? – focuses on the purpose, scope and policies for the system – describes the business requirements and how to meet them • information viewpoint – What is it about? – focuses on the semantics of the information and the information processing performed – describes the information managed by the system and the structure and content type of the supported data • computational viewpoint – How does each bit work? – enables distribution through functional decomposition on the system into objects which interact at interfaces – describes the functionality provided by the system and its functional decomposition • engineering viewpoint – How do the bits work together? – focuses on the mechanisms and functions required to support dis- tributed interactions between objects in the system – describes the distribution of processing performed by the system to manage the information and provide the functionality • technology viewpoint – With what? – focuses on the choice of technology of the system – describes the technologies chosen to provide the processing, function- ality and presentation of information
2.3.2 The “4+1” View Model of Software Architecture
Another approach for describing an architecture which is also based on view- points was introduced by Philippe Kruchten in 1995 (Kruchten, 1995). The aim was the same as RM-ODP’s – splitting the different aspects of a system into multiple views and describing an architecture with these allowing to address requirements of the different stakeholders. To achieve this goal he proposed these five (4+1) main views: (1) logical view: object model of the design, contains information about the parts of the system (2) process view: captures the concurrency and synchronization aspects of the design, encompasses some non-functional requirements such as performance and availability, too (3) physical view: describes the mapping(s) of the software onto the hardware and reflects its distributed aspect i.e. by specifying the amount of nodes (4) development view: describes the static organization of the software in its development environment, focusses on software modules and subsystems (5) use case view: discovers the architectural elements and validates and illus- trates the architecture
28 2.3 Standards for reference models
The use case view is the reason why this design approach is called 4+1 because the use case view is essentially redundant (hence +1). However, all the other views would not have been possible without it because the use cases respectively scenarios are an abstraction of the most important requirements leaving out specific details which means that the other views evolve on this base (Staveley, 2011).
2.3.3 Use of UML in reference models
Unified Modeling Language (UML) The aim of UML which is a standard specified by the Object Management Group “[. . .] is to provide system architects, software engineers, and software developers with tools for analysis, design, and implementation of software- based systems as well as for modeling business and similar processes.” The quote already indicates that UML has a very broad scope with many domains it may be applied to (OMG, 2011, p. 1). This is reflected by the variety of UML diagrams available. Diagrams give extensive information about a system in a graphical representation but in most cases this representation displays only part of the systems (a subset of its classes, components etc.) (OMG, 2011, p. 15).
Diagram
Structure Behavior Diagram Diagram
Component Object Activity Class Diagram Use Case State Machine Diagram Diagram Diagram Diagram Diagram
Composite Deployment Package Structure Interaction Diagram Diagram Diagram Diagram
Interaction Profile Diagram Sequence Overview Diagram Diagram
Communication Timing Diagram Diagram
Figure 2.10: Class diagram of UML diagram types (OMG, 2011, p. 694)
As depicted in figure 2.10 the UML diagrams fall into two main categories: structure and behavior diagrams. In contrast to behavior diagrams which are dynamic, in the sense that they show interaction between elements, structure diagrams are static. Thus they only represent elements which are independent of time and which have to be available in the system being modelled. Taken
29 2 Fundamentals, basic concepts and standards class diagrams as an example for structure diagrams they specify the classes, their attributes and the relationships between the classes of the system (OMG, 2011, p. 694). As already stated behavior diagrams are dynamic thus pointing out how the system changes over time. Taken use case diagrams as an example for behavior diagrams they describe the functionality of a system in regard to actors, their goals which are represented as use cases and relationships and dependencies between these use cases (OMG, 2011, p. 694).
RM-ODP The original documents for the RM-ODP standard mention neither UML as a tool for modelling a system respectively an infrastructure nor a no- tation or model development method. But since UML gained importance in the last years the ISO proposed a standard for the “Use of UML for ODP system specifications” (ISO, 2009) at a later stage. The standard describes and defines how the viewpoints can be modelled with UML. For three of its five viewpoints it suggests diagram types which should be used to model certain aspects within these viewpoints. To model the configuration and structure of computational objects and their dependencies, composition and decomposition within the com- putational viewpoint the standard recommends a component diagram. Activity diagrams, state charts and interaction diagrams should be used to model inter- actions between computational objects (ISO, 2009, p. 44). Within the engineering viewpoint a configuration of engineering objects which are structured as clusters, capsules or nodes is expressed by component di- agrams, the activities going on within them with activity diagrams and the interactions between the engineering objects with sequence, activity and interac- tion diagrams. Component diagrams are also used to model the structure of a node (ISO, 2009, p. 58). The technology viewpoint models how specifications are implemented using component diagrams. It also models the structure of node instances and com- munication links between them using deployment diagrams (ISO, 2009, p. 60).
The “4+1” View Model of Software Architecture While (Kruchten, 1995) pro- posed a notation for each viewpoint he could not have made proposals regarding UML diagrams to use for the viewpoints because the “4+1” View Model was put forward two years before UML was developed. However, inter alia (Staveley, 2011) and (Kontio, 2005) make proposals on which UML diagrams can be used for each viewpoint24: • logical view: class, object, state machine, interaction (e.g. sequence) and communication diagrams • process view: activity diagrams
24(Staveley, 2011) in italics, (Kontio, 2005) in bold and both in italics and bold
30 2.4 Standards for knowledge representation
• physical view: deployment diagrams • development view: package and component diagrams • use case view: use case diagrams
2.4 Standards for knowledge representation
The GSDI Cookbook (Nebert, 2004, p. 8) mentioned metadata as one important component of an SDI (see subsection 2.1.3 on page 11). Metadata is data about data – metadata describes the data by stating its owner, its thematic scope, how the data was collected, how often it is updated and so on. All that is needed is some form with some blank fields where the user types the information with which he likes to describe the data (set). In order to index the metadata fields terms are needed. Semantic interoperability (see subsection 2.1.1 on page 9) is needed to ensure that different actors or systems have a common understanding of the meanings of terms. Interoperability was already defined by (Heiler, 1995, p. 271) (in subsection 2.2.1 on page 19) to be “[. . .] the ability to exchange services and data with one another. It is based on agreements between requesters and providers on, for example, message passing protocols, procedure names, error codes, and argument types.” Semantic interoperability makes sure that both the requesters and providers have the same understanding of said services and data. Let us assume a user is indexing metadata for a data set which is about a beach and there is a metadata field keywords where he accidentally puts “baech” instead of “beach”. If this mistake is overlooked nobody who is looking for data about beaches will be able to find his data set because of the typo. If he would have had to choose from a list of keywords or by autocompleting his typing – i.e. by using a controlled vocabulary respectively a predefined list of terms – this problem would not have happened. Synonyms are another example to encourage the use of a controlled vocabulary respectively a thesaurus. When a thesaurus is used in conjunction with the search function the typo problem is eliminated as explained above. A thesaurus can also contain relationships between terms to cover synonyms for example. Continuing the “beach” example it would be great for a user that when he is looking for data about beaches he gets results which contain “coast”, too, because the terms are used interchangeably at times, although scientifically they represent a hierarchy and cannot be used interchangeably. But a user with little knowledge might want to get data about “coast” although he/she is searching for the term “beach”. The superordinate concept in the field of knowledge representation is ontology which is explained in subsection 2.4.2. Formal languages describing ontologies are inter alia the Resource Description Framework (RDF, see 2.4.2.1 on page 34)25
25To be precise RDF Schema (RDFS) is meant here because RDF is a whole family of W3C
31 2 Fundamentals, basic concepts and standards and the Web Ontology Language (OWL) which are specifications by the World Wide Web Consortium (W3C). Neither RDF(S) nor OWL are intended or specialized for use with vocabularies and thus only offer limited labelling capabilities and – especially in case of OWL – have semantically strict relationships (super-/subclasses, not weaker ones like “related”). However, the Simple Knowledge Organisation System (SKOS, see 2.4.2.2 on page 37) is specified by the W3C to organize knowledge and model thesauri in RDF. The simple in its name means that it is very easy to map concept trees and relations. These facts make SKOS a good choice to represent controlled vocabularies on the web. Besides SKOS there are also several other standards available to represent vocabularies that are also built on RDF. However, these are specified for other fields such as DOAP (Description of a Project), FOAF (Friend of a friend) and SIOC (Semantically-Interlinked Online Communities Project) and thus are not general enough.
2.4.1 Fundamentals – XML and DOM
The foundations for all the approaches in the field of knowledge representation presented in this thesis were laid by the World Wide Web Consortium (W3C) in the form of the Extensible Markup Language (XML) which it defines as26: “[. . .] a simple, very flexible text format [. . .]. [. . .] XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere.” Furthermore the W3C states about the structure respectively components of XML in its technical report respectively specification (Bray et al., 2008) that: “XML describes a class of data objects called XML documents [. . .]. XML documents are made up of storage units called entities [. . .].” To work with these documents the W3C developed the Document Object Model (DOM) which27: “[. . .] is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. The document can be further processed and the results of that processing can be incorporated back into the presented page.” Later on in this thesis DOM will be used to implement a tool which converts Excel lists respectively Comma-separated values (CSV) files to SKOS format (see
specifications. 26http://www.w3.org/XML/ 27http://www.w3.org/DOM/
32 2.4 Standards for knowledge representation subsection 3.5.2 on page 71). Because the tool is implemented in Java a Java implementation of the DOM Application Programming Interface (API) will be used – JDOM that is according to (Harold, 2002): “[. . .] an open source, tree-based, pure Java API for parsing, creating, manipulating, and serializing XML documents.”
2.4.2 Ontologies
As already pointed out, ontologies are the superordinate concept in the domain of knowledge representation and (Gruber, 2009) states that “[. . .] an ontology defines a set of representational primitives with which to model a domain of knowledge or discourse. The representational primitives are typically classes (or sets), attributes (or properties), and relationships (or relations among class members). ” This definition already mentions domain of knowledge and representational primitives (inter alia classes and relationships). (Jepsen, 2009) adds to this by specifying “[. . .] that an ontology is a method of representing items of knowledge (ideas, facts, things – whatever) in a way that defines the relationships and classifications of concepts within a specified domain of knowledge.” While this definition also mentions the most important elements from the previ- ous definition (domain of knowledge, concepts [classes] and relationships), it adds one further element: items of knowledge.
Components of and an example for an ontology The two definitions delivered a number of components such as concepts/classes, items of knowledge and relationships. (Lord, 2010) states that “Concepts, also called Classes, Types or Universals are a core component of most ontologies. A Concept represents a group of different Individuals, that share common characteristics, which may be more or less specific.” The example depicted in figure 2.11 is about beaches located in different coun- tries. Beaches and countries are classes in this example. These two classes of course represent a set of individuals – a set of beaches such as Boulders Beach, Venice Beach, Hanalei Bay, Copacabana, Bondi Beach and Byron Bay as well as a set of countries such as Australia, Brazil, the United States and South Africa. Individuals are defined in (Lord, 2010) as “[. . .] instances or particulars are the base unit of an ontology; they are the things that the ontology describes or potentially could describe. Individuals may model concrete objects such as people, machines or proteins; they may also model more abstract objects such as this article, a person’s job or a function.”
33 2 Fundamentals, basic concepts and standards
Copacabana isLocatedIn Brazil
Venice Beach isLocatedIn Hanalei Bay isLocatedIn United States Beaches Countries Bondi Beach isLocatedIn Australia Byron Bay isLocatedIn Boulders Beach isLocatedIn South Africa
Figure 2.11: Beaches and countries example illustrating relationships among classes and instances (inspired by (Jepsen, 2009))
When we stick to our example the relationship between the two classes is pretty easy to see: a beach has to be located in one of the countries (relationship isLocatedIn in figure 2.11). Again, (Lord, 2010) is offering a definition for the term relationships by pointing out that they “[. . .] describe the way in which individuals relate to each other. Relations can normally be expressed directly between individuals (this article has author Phillip Lord) or between Concepts (an article has author a person) [. . .].” (Gruber, 2009) is taking us back to the W3C with stating that “[. . .] ontologies are part of the W3C standards stack for the Semantic Web, in which they are used to specify standard conceptual vocabularies in which to exchange data among systems, provide services for answering queries, publish reusable knowledge bases, and offer services to facilitate interoperability across multiple, heterogeneous systems and databases.” 2.4.2.1 will explain the first formal language describing ontologies which han- dles the data exchange portion of above citation – the Resource Description Framework (RDF).
2.4.2.1 Resource Description Framework – RDF
The Resource Description Framework (RDF) is a standard by the W3C using XML (see subsection 2.4.1) syntax “[. . .] for data interchange on the Web” 28. Furthermore the W3C states that “RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (this is usually
28http://www.w3.org/RDF/
34 2.4 Standards for knowledge representation
referred to as a “triple”). Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications. This linking structure forms a directed, labeled graph, where the edges represent the named link between two resources, represented by the graph nodes. This graph view is the easiest possible mental model for RDF and is often used in easy-to-understand visual explanations.”
To represent such graphs RDF is using RDF triples – explained in the RDF/XML Syntax Specification29 describing the structure of RDF – which is composed of (as illustrated in figure 2.12) a subject node, predicate and an object node which means that an object describes a subject because they are related in some way (predicate). All three components can be RDF URI references but only the object can be a literal, too.
Predicate Subject Object
Figure 2.12: RDF Structure (modified after (Klyne & Carroll, 2004))
In summary it can be concluded that RDF
• is made for data interchange on the web, • is using URIs, • handles relationships between things, • and forms a directed, labeled graph (i.e. its linking structure).
In its documentation of the RDF Primer (Manola & Miller, 2004) the W3C is giving an example for the usage of RDF and the representation of RDF as a graph. It specifies a resource with these statements which are altered to fit in the marine domain:
(1) There is a Person identified by http://www.baw.de/kontakt/RL (2) whose name is Rainer Lehfeldt (3) whose email address is [email protected] (4) and whose title is Dr.
29(Klyne & Carroll, 2004)
35 2 Fundamentals, basic concepts and standards
1 § ¤ 2
¦Listing 2.1: RDF/XML describing Rainer Lehfeldt (inspired by (Manola & ¥ Miller, 2004))
(1) is the subject and it is identified by an URI while the objects describing this subject are (2) to (4). The subjects are also containing the predicates: whose name is, whose email address is and whose title is. Figure 2.13 shows that the predicates (arrows in the figure) also have URIs. Besides the three subjects (yellow) and the object (blue) there is also a type in the figure (red). With the predicate shown in the figure it specifies that the subject is of type http: //www.baw.de/kontakt#Person. The RDF/XML representation corresponding to figure 2.13 is shown in listing 2.1.
http://www.baw.de/kontakt#Person
http://www.baw.de/rdf/syntax#type
http://www.baw.de/kontakt/RL
http://www.baw.de/kontakt#fullName
Rainer http://www.baw.de/kontakt#mailbox Lehfeldt
http://www.baw.de/kontakt#personalTitle [email protected]
Dr.
Figure 2.13: An RDF Graph describing Rainer Lehfeldt (inspired by (Manola & Miller, 2004))
36 2.4 Standards for knowledge representation
For the representation in RDF triples subjects are defined as underlined, pred- icates as italics and objects as bold. The syntax of this representation30 is: subject predicate object and when applied to the example describing Rainer Lehfeldt the result looks like this:
(1) http://www.baw.de/kontakt/RL http://www.baw.de/kontakt# contact (2) contact#me http://www.baw.de/rdf/syntax#type contact#Person (3) contact#me contact#fullName ’Rainer Lehfeldt’ (4) contact#me contact#personalTitle ’Dr.’ (5) contact#me contact#mailbox [email protected]
(1) is a declaration for the URI http://www.baw.de/kontakt/RL which is used in (almost) every subject and predicate so that contact can be used instead of this rather long URI. (2) defines that the subject contact#me is of the type contact#Person and (3) to (5) define the object’s title, name and mail address of the subject by the predicates contact#personalTitle, contact#fullName and con- tact#mailbox.
2.4.2.2 Simple Knowledge Organisation System – SKOS
Another standard for the representation of controlled vocabularies that is built upon the Resource Description Framework is the Simple Knowledge Organisa- tion System (SKOS, also specified by the W3C) which31:
“[. . .] is a formal language for representing controlled structured vocabularies such as thesauri or classification schemes.”
Because SKOS is an application of RDF it32 “[. . .] can be used to express the content and structure of a concept scheme as an RDF graph.” A very simple graph is shown in figure 2.14 which is the first step in the example now to be built33 which will model certain aspects about the term beach. The figure shows the definition of a resource (i.e. a term) called ex:beach which is of rdf:type skos:Concept. Listing 2.2 shows the RDF/XML syntax representing the figures graph. The listing was just used to show the rdf:type usage. The remaining examples will use the shortened form shown in listing 2.3.
30similar to the N-Triples notation 31(Miles, 2006) 32(Miles et al., 2005) 33cf. (Miles & Brickley, 2005)
37 2 Fundamentals, basic concepts and standards
¦ Listing 2.2: RDF/XML syntax of the SKOS concept beach ¥
¦ Listing 2.3: RDF/XML syntax of the SKOS concept beach (shortened) ¥
Throughout the example the prefix skos: is used to abbreviate the URI http:// www.w3.org/2004/02/skos/core# meaning that e.g. skos:prefLabel written out is http://www.w3.org/2004/02/skos/core#prefLabel. The two further prefixes used are rdf: which abbreviates the URI http://www.w3.org/1999/ 02/22-rdf-syntax-ns# and ex: which simulates an own URI and is defined as http://example.net/concepts.
ex:beach rdf:type skos:Concept
Figure 2.14: An RDF Graph defining the SKOS concept beach (roughly based on (Miles & Brickley, 2005))
SKOS classes The first element (skos:Concept) of the SKOS data model was introduced in the short example. A skos:Concept is a SKOS class34 which35 “[. . .] can be viewed as an idea or notion; a unit of thought.” According to the SKOS reference (Miles & Bechhofer, 2009) there are three further SKOS classes aside from skos:Concept (but to keep things as short as possible only skos:Concepts will be explained in more detail):
34Note that (Lord, 2010) used class synonymous with concept. 35(Miles & Bechhofer, 2009)
38 2.4 Standards for knowledge representation
• skos:Collection – labeled and/or ordered groups of SKOS concepts • skos:OrderedCollection – ordered group with meaningful ordering • skos:ConceptScheme – aggregation of one or more SKOS concepts (used for data from unknown or external sources)
SKOS properties There is a range of properties which can be assigned to a skos:Concept:
(1) Labelling properties (2) Documentation properties (3) Semantic relationships
(1) Labelling36 “[. . .] means assigning some sort of token to a resource, where the token is intended to be used to denote (label) the resource in natural language discourse and/or in representations intended for human consumption.” SKOS offers five properties:
(1) skos:prefLabel (3) skos:hiddenLabel (5) skos:altSymbol (2) skos:altLabel (4) skos:prefSymbol
The symbolic labelling (4) and (5) labels a concept with an image. More important for the usage in this thesis are the labeling properties (1) to (3). (1) and (2)37 “[. . .] allow you to assign preferred and alternative lexical labels to a resource.” A language tag can be applied to these types of labels as figure 2.15 and listing 2.4 show. Due to this option a multilingual thesaurus can be built with SKOS and later in a web application users see the labels for terms according to the language they configured to use. Listing 2.4 also displays skos:hiddenLabels (3) which can be accessed by applications (e.g. for text-based indexing and search functions) but are not visible otherwise. These can be used for typos for example, so that users find certain terms even if they mistype them.
36(Miles & Brickley, 2005) 37(Miles & Brickley, 2005)
39 2 Fundamentals, basic concepts and standards
'beach'@en 'Strand'@de
'shore'@en skos:prefLabel 'Ufer'@de
skos:altLabel 'coast'@en skos:altLabel 'Küste'@de skos:prefLabel skos:altLabel skos:altLabel
ex:beach rdf:type skos:Concept
Figure 2.15: An RDF Graph labelling the SKOS concept beach (roughly based on (Miles & Brickley, 2005))
¦Listing 2.4: RDF/XML syntax of the SKOS concept beach with multilingual ¥ labels and hidden typos
The labels discussed so far are lexical entities which means that they are more or less string literals. Because of that the labels are not objects themselves and thus cannot be described further with metadata. To add information about the labels – such as who was the author of a particular label or when was the label last updated – the W3C built an extension for SKOS – SKOS eXtension for Labels (SKOS-XL) – which “[. . .] defines an extension for the Simple Knowledge Organization System, providing additional support for describing and linking lexical entities.”38
38(Miles & Brickley, 2009)
40 2.4 Standards for knowledge representation
The “new” labels are skosxl:prefLabel, skosxl:altLabel and skosxl:- hiddenLabel which are instances of the class skosxl:Label. Instances of this class have a skosxl:literalForm which holds the label of the concept and on top of that any additional properties a user wants/needs. The usage of the additional properties :lastEdited and :myCustomProperty of the concept beach is shown in listing 2.5.
@prefix skos:
¦Listing 2.5: Turtle syntax of the SKOS concept beach with added SKOS-XL ¥ properties (modified after (DuCharme, 2011))
(2) Documentation In addition to labels terms can be described more in-depth using documentation properties. SKOS is offering the following seven to add human-readable content to concepts39:
• skos:note – general documentation for any purpose – skos:definition – complete explanation of the meaning – skos:scopeNote – information about what is or is not included within the meaning (scope) – skos:example – show exemplary use – skos:historyNote – e.g. reflect or describe changes of a term – skos:editorialNote – e.g. reminders of future editorial work – skos:changeNote – document changes
Figure 2.16 and listing 2.6 are showing the usage of two of these documentation properties. The concept beach has a definition in English and in German. Because the English definition may not be good, for instance, there is also a skos:edi- torialNote that reminds the user(s) to come up with a better definition.
39cf. (Miles & Brickley, 2005)
41 2 Fundamentals, basic concepts and standards
'beach'@en 'A beach is the sand where water hits the land.'@en 'Strand'@de
skos:prefLabel skos:definition skos:prefLabel 'Ein Strand ist ein flacher Küsten- oder Uferstreifen aus Sand oder skos:definition Geröll. (Wikipedia)'@de ex:beach skos:editorialNote 'Think of a better English definition.'
Figure 2.16: An RDF Graph documenting the SKOS concept beach (roughly based on (Miles & Brickley, 2005))
¦ Listing 2.6: RDF/XML syntax documenting the SKOS concept beach ¥
(3) Semantic Relationships40 are used to show that a concept relates to another concept by providing a link between them. These relations can be either hierar- chical or associative. Figure 2.17 and listing 2.7 are showing the existence of a hierarchical relationship between the concept beach and beach scarp where beach is the superordinate (skos:broader) term in relation to beach scarp which in turn is a subordinate (skos:narrower) term compared to beach. The figure and listing are also showing two skos:related relationships which are associative and the last type of semantic relationships. In the example there are relations from beach to sand and water and vice versa which is pretty easy to imagine.
40cf. (Miles et al., 2005, p. 4) and (Miles & Bechhofer, 2009)
42 2.5 Directives in the marine and SDI domain
'A beach is the sand where 'An almost vertical slope along the beach water hits the land.'@en caused by erosion by wave action.'@en
'beach'@en skos:definition 'beach scarp'@en skos:definition
skos:prefLabel skos:prefLabel
skos:narrower ex:beach ex:beach scarp skos:broader
skos:related skos:related ex:sand
ex:water
Figure 2.17: An RDF Graph showing the semantic relationships in relation to beach (roughly based on (Miles & Brickley, 2005))
¦Listing 2.7: RDF/XML syntax showing the semantic relationships in relation ¥ to beach
2.5 Directives in the marine and SDI domain
Important drivers for the development of a marine SDI are directives. In general there is a close connection between SDIs and directives: SDIs support administra-
43 2 Fundamentals, basic concepts and standards tive activities and directives are part of or influence said administrative activities. For European countries like Germany these directives are on the European level and are thus legislated by the European Union. The main directives affecting the marine domain are the Water Framework Directive (WFD, subsection 2.5.2) and the Marine Strategy Framework Directive (MSFD, subsection 2.5.3) but some of the annexes of the INSPIRE directive are also important for the marine domain (as subsection 2.1.5 on page 14 already outlined). However, they also have counterparts in German law or existing German laws were adjusted to meet their requirements for Germany and its federal states. Examples include Meeresstrategie-Rahmenrichtlinie (MSRL respectively MSFD), Wasserrahmen- richtlinie (WRRL respectively WFD), Fauna-Flora-Habitat-Richtlinie (FFH-RL respectively Habitats Directive) and Vogelschutzrichtlinie (VS-RL respectively Birds Directive). Other legislation and directives include the national law on access to spatial data (GeoZG), the Environmental Information Act (UIG) as well as the Water Information System for Europe (WISE), Baltic Marine Environment Protection Commission (HELCOM), Convention for the Protection of the Marine Environment of the North-East Atlantic (OSPAR), EU Shared Environmental Information System (SEIS) and Agenda 21 (United Nations).
2.5.1 INSPIRE
As subsection 2.1.4 pointed out, INSPIRE is a European directive on which a regional SDI (for Europe) will be built. This SDI is located at the regional level because it covers all the EU member states and relies on their national SDIs. Furthermore, INSPIRE is an example for a legally enforced SDI because it is a legal act (directive 2007/2/EC) of the Council of the European Union and the European Parliament. INSPIRE focuses on environmental policy and aims at strengthening the availability and accessibility of data overcoming barriers such as incompatibility and inconsistency (already outlined in section 2.1). The challenge of achieving these goals is that INSPIRE is built by 27 different countries (and therefore has to support more than 23 languages) with very different information systems, professional and cultural practices. The scope of INSPIRE is defined by 34 themes which fall into three categories or annexes where the first two focus on “fundamental datasets” respectively reference data such as coordinate reference systems, geographical names and elevation while annex III covers data for environmental analysis and impact assessment such as environmental monitoring facilities and sea regions (Craglia, 2010b).
INSPIRE architecture and services The architecture of INSPIRE is depicted in figure 2.18. This figure also shows the core component respectively resource: spatial data in spatial data sets. All other components (metadata, services and so on) are only needed to find, use, interpret or access said spatial data. Data access
44 2.5 Directives in the marine and SDI domain